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1.  INTRODUCTION 
A.  Review  of  Literature 

Many  studies  have  been  conducted  to  evaluate  the  various  parameters  involved  in 
displaying  conventional  TV  information.  Biberman  (1973)  edited  and  reviewed  an  exten- 
sive body  of  research  on  image  quality  which  provides  the  background  to  predict  the 
effects  of  resolution,  field  of  view,  contrast,  granularity,  signal-to-noise  level,  and  a host  of 
electro-optical  variables  on  human  perceptual  performance. 

Pesch  (1967)  conducted  one  of  the  first  studies  evaluating  mono  TV  vs  stereo  TV 
displays  for  undersea  applications.  He  used  two  tasks  commonly  found  in  underwater 
operations;  the  first  required  direct  interaction  with  an  object,  i.e.,  cable  handling  with  a 
manipulator,  while  the  second  required  spatial  positioning  of  the  manipulator.  His  results 
indicated  that  there  was  no  significant  performance  difference  between  mono  and  stereo 
on  the  spatial  positioning  task.  An  initial  stereo  advantage  on  the  cable  handling  task 
washed  out  with  repeated  testing.  Performance  decreased  under  mono,  but  not  stereo 
condition,  when  visibility  conditions  were  degraded.  Pesch  concluded  that  the  advantage 
given  by  a stereo  display  is  task  dependent,  is  directly  related  to  the  visual  environment, 
and  may  be  sensitive  to  the  repetitive  nature  of  the  task. 

Hudson  and  Culpit  (1968)  conducted  a study  to  assess  size  and  distance  judgments 
using  mono  and  stereo  TV  displays  at  different  signal-to-noise  ratios.  Their  main  results 
indicated  that  with  highly  practiced  subjects,  there  was  no  significant  stereo  advantage 
for  targets  located  20  to  200  feet  from  the  observer. 

A number  of  studies  were  supported  by  NASA  to  evaluate  mono  TV  and  stereo  TV 
displays  under  visibility  conditions  encountered  in  space  exploration.  In  the  first  of  these, 
Huggins,  Malone  and  Shields  (1973)  measured  detection  and  recognition  thresholds  with  a 
mono  TV  system  and  compared  performance  on  a distance  estimation  task  using  both 
mono  and  stereo  TV  systems.  Their  results  indicate  that  for  a standard  525-line  system,  the 
smallest  object  detectable  requires  about  2 TV  scan  lines  and  subtends  about  5 arc-minutes. 
For  form  recognition,  angular  targets  are  easier  to  recognize  than  circles  or  hexagons; 
image  size  must  be  25-35  arc-minutes.  On  the  distance  estimation  task,  best  performance 
was  obtained  with  two  orthogonal  cameras  located  in  the  horizontal  plane.  Both  mono  and 
stereo  performance  improved  when  they  employed  a camera  angle  45°above  the  horizontal 
plane.  No  differences  in  performance  were  found  for  the  various  types  of  TV  displays  they 
tested. 
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Shields,  Kirkpatrick,  Malone  and  Huggins  (1975)  directed  their  study  toward  a deter- 
mination of  stereo  range  resolution  using  mono  and  stereo  TV*.  They  found  that  “range 
resolution”(judgment  of  minimum  offset  of  2 rods  for  agiven  distance)  could  be  improved 
by  decreasing  the  viewing  distance  to  the  display,  increasing  the  stereo  baseline  (camera 
separation),  or  by  using  longer  local  length  lenses.  The  authors  argue  for  natural  perspec- 
tive stereo  (or  orthographic)  where  both  the  convergence  angle  and  retinal  disparity  are  the 
same  lor  the  direct  view  and  the  monitor  view  (however,  they  did  not  employ  natural  per- 
spective in  their  study). 


Grant,  Meirick,  Polhemus,  Spencer,  Swain  and  Tewell  (1973)  investigated  display 
system  parameters  with  a Fresnel  stereo  display.  Although  they  did  not  obtain  stereo  com- 
parisons with  mono,  important  tindings  regarding  camera  separation,  convergence  angle, 
and  held  ol  view  were  reported.  Task  time  was  employed  as  a measure  of  performance, 
where  the  operator  was  required  to  place  various  blocks  in  a receptacle.  The  results  indi- 
cate that  perlormance  is  unchanged  as  a function  of  separation  between  left  and  right 
cameras  tor  all  but  extreme  separation  positions  (6-,  1 2-,  and  18-inch  separations  resulted 
iti^  similar  task  times,  while  24  inches  degraded  performance).  Field  of  view  was  tested  from 
5°  to  30°  with  a convergence  angle  of  4.8°  and  a camera  separation  of  6 inches.  Best  per- 
tormance  occurred  with  10  to  17  fields  of  view.  The  authors  recommend  a display 
system  with  a 2.5-inch  camera  separation,  a 6.8°  convergence  angle,  and  a variable  9°  to 
54°  field  of  view  (zoom  lens)  for  their  particular  space  application  (even  though  they  did 
not  evaluate  camera  separation  closer  than  6 inches). 

Tewell,  Ray,  Meirick  and  Polhemus  (1974)  compared  mono  TV  with  a Fresnel  stereo 
TV  on  a depth  alignment  task  using  different  camera  locations  and  objects  of  varying  size 
and  shape.  The  results  indicated  that  with  equal  sized  targets,  stereo  performance  was 
better  than  mono  performance  by  a factor  of  2 under  all  camera  locations  tested.  However, 
when  rectangles  of  unequal  size  were  aligned  (i.e.,  when  size  cues  were  absent),  stereo 
performance  was  further  improved  over  mono  by  a factor  of  5.  In  a second  study,  they 
made  a comparison  between  a two-view  mono  system  and  the  same  Fresnel  stereo  system 
with  variations  of  camera  position  and  lighting.  Tasks  consisted  of  inserting  a wooden 
block  into  a hole  that  was  1/16”  larger  than  the  block  and  placing  a metal  drawer  into  a 
3/16”  larger  guide.  For  both  tasks,  the  locations  were  offset  0°,  45°,  and  90°  to  the  hori- 
zontal. The  authors  state  that  they  were  unable  to  draw  conclusions  comparing  mono  and 
stereo  performance,  due  to  kinesthetic  feedback  which  enabled  the  subjects  to  accomplish 
the  tasks  without  any  visual  reference. 

In  the  final  NASA-supported  study  reported  here.  Crooks,  Freedman  and  Coan  (1975) 
evaluated  a wide  variety  of  display  systems.  Using  four  manipulator  tasks  (positioning, 
coupling,  docking,  and  obstacle  clearance),  performance  was  compared  with  black  and 
white  mono,  color  mono,  a two-view  system  in  black  and  white  mono,  and  an  anaglyph 
(color  separated)  stereo  TV  system.  The  authors  selected  tasks  which  they  reasoned  were 
representative  of  all  possible  operator  tasks.  The  two  relevant  dimensions  of  concern  were 
the  manipulator-object  relationships  and  the  size  of  the  work  volume.  Tasks  were  classified 


*Ttie  stereo  system  employed  by  this  group  is  one  of  several  excellent  techniques  for  presenting 
a different  video  image  to  each  of  the  two  eyes,  thus  achieving  stereopsis.  A description  of  the 
technique  is  given  in  Grant,  et  al  (197.3). 


into  two  levels  of  element  relationship  (connection/docking  and  transportation/clearance) 
and  two  sizes  of  working  spaces  (large  and  small). 


A visual  function  analysis  was  conducted  to  determine  the  visual  scene  parameters  and 
basic  perceptual  operations  necessary  to  complete  the  tasks.  This  study  is  unusual  in  that  it 
was  the  first  attempt  to  evaluate  the  dimensions  of  the  visual  environment  in  conjunction 
with  TV  display  evaluations.  Using  a task-analysis  method  to  determine  the  scene  para- 
meters, the  dimensions  identified  were  (1)  object  differentiation  — discriminability  of 
objects  based  upon  differences  in  brightness,  color,  size,  shape,  etc.,  (2)  depth  precision  — 
a summary  of  depth  cues,  such  as  perspective,  interposition,  parallax,  (3)  reference  — the 
cues  contributing  to  perception  of  an  object's  orientation  and  spatial  position  within  the 
scene,  and  (4)  scene  dynamics  - primarily  the  amount  of  motion  present.  The  results  of 
i this  study  indicate  that  the  coupling  and  positioning  tasks  had  smaller  position  errors, 

fewer  contact  errors,  and  required  less  time  than  the  docking  and  clearance  tasks.  The  use 
of  a color  TV  display  did  not  improve  performance  in  any  of  their  particular  task-scene 
conditions,  nor  did  the  use  of  a stereo  display.  A significant  improvement  in  positioning 
accuracy  was  obtained  when  two  mono  cameras  were  orthogonally  located  in  the  same 
I horizontal  plane.  Relatively  little  effect  was  found  for  scene  parameters,  dynamics,  or 

•;  depth  precision.  The  authors  concluded  their  study  by  assigning  a burden  factor  to  each 

j TV'  system,  comparing  them  on  the  basis  of  cost,  weight,  volume,  power,  maintainability 

j and  reliability.  Recommendations  for  a two-view  (orthogonally  positioned)  black  and 

1 white  display  system  are  made,  based  on  the  results  of  this  study. 

\ 

I In  the  previously  cited  study  by  Hudson  and  Culpit,  the  authors  stated  that  they  had 

f selected  a stereo  system  which  employed  a polaroid-separation  technique  after  having 

' determined  that  the  anaglyph,  or  color  separated  stereo  method  was  ineffective  due  to  the 

human  eye’s  response  to  color  of  different  wavelengths  and  due  to  the  transmission  charac- 
teristics of  available  color  filters.  They  reasoned  that  a color  stereo  system  using  anaglyphs 
would  be  even  less  successful  than  the  previously  described  system.  Unfortunately,  the 
anaglyph  system  is  the  type  of  stereo  display  employed  by  Crooks  et  al.  It  is  not  known  to 
what  extent  this  factor  contributed  to  an  otherwise  comprehensive  study.  Thus,  any  con- 
'.  elusions  related  to  stereo  performance  based  on  the  Crooks,  et  al  data  must  be  evaluated 

with  caution. 

Zamarin  (1976a),  critically  reviewed  six  studies  which  were  concerned  with  stereo 
display  applications.  He  concluded  that  a thorough  parametric  evaluation  of  stereo  should 
include  the  following  independent  variables:  viewing  system  (stereo  TV.  mono  TV.  direct 
viewing  with  the  unaided  eye),  camera  parameters  (stereo  baseline  separation,  convergence 
angle,  field  of  view),  display  parameters  (display  size,  viewing  distance,  resolution,  bright- 
ness and  contrast),  stereo  channel  characteristics  (size  match,  brightness  match,  vertical 
: alignment,  rotational  alignment),  perspective  relationships,  and  subject  variables. 

# 

In  a second  volume  of  this  report,  Zamarin  ( 1 976b)  reported  comparative  operator 
performance  using  three  viewing  systems  (mono  TV,  cross-polaroid  stereo  TV,  and  field- 
sequential*  stereo).  He  also  studied  the  effects  of  stereo  baseline  (camera  separation). 


♦An  old,  well-proven  technique  used  in  map-plotting  equipment  with  rotational  mechanical 
shutters  to  alternately  present  left  and  right  eye  views;  the  equipment  evaluated  by  Zamarin 
uses  new  electro-optics. 
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camera  convergence  angle,  and  field  of' view  for  the  two  stereo  systems.  His  results  indi- 
cated that  with  a 3 rod  depth  discrimination  task,  camera  separation  affected  accuracy  of 
performance  but  not  response  time  (a  separation  of  7 to  10  inches  resulted  in  the  best 
performance).  Camera  convergence  angle  had  no  significant  effects;  that  is,  it  made  no 
difference  if  the  convergence  point  was  ahead  of,  behind,  or  on  the  target.  Changes  in  the 
field  of  view  (magnification  of  l.Ox,  1.25x  and  2.0x  corresponding  to  28°,  22.6°  and 
14.25°,  respectively)  had  no  significant  effects  in  the  depth  judgment  task.  Finally,  the 
Polaroid  and  the  field  sequential  displays  resulted  in  similar  performance  with  both  yield- 
ing an  error  factor  that  was  1 /2  to  1 /3  that  of  the  mono  TV  display. 

There  was  no  mono  field  of  view  assessment.  Direct  view  stereo  acuity  was  10  arc- 
seconds,  while  comparable  TV  stereo  acuity  was  about  25  arc-seconds,  or  a factor  of  about 
2.5  poorer.  No  comparable  measures  of  direct  mono  and  mono  TV  acuity  were  obtained. 

Zamarin  reports  that  the  field  sequential  and  Polaroid  stereo  systems  provided  com- 
parable levels  of  performance  in  the  3-rod  depth  task  despite  major  illumination  differences 
between  them.  He  suggests  that  the  Fresnel  system’s  reduced  vertical  resolution  (although 
not  relevant  to  this  rod  task)  may  have  degrading  effects  which  would  be  most  likely  to 
occur  when  viewing  a '^omplex  scene;  i.e.,  involving  more  varied  contrasts  and  shapes. 


B.  Implications  for  Research 

Of  the  eight  studies  reviewed  here,  six  directly  compared  task  performance  using  stereo 
and  mono  TV  displays.  Four  of  the  six  concluded  that  stereo  provides  no  significant  per- 
formance advantage.  This  fact,  when  considered  in  light  of  other  features  of  the  review, 
appears  somewhat  contradictory.  That  is,  the  literature  suggests  that  the  state-of-the-art  for 
producing  televised  images  is  high.  One  can  select  the  appropriate’physical  conditions  and 
electrical  components  to  produce  high  quality  images.  The  visual  scene  parameters  have 
been  determined  as  a function  of  working  space,  camera  coverage,  etc.,  and  much  experi- 
mental work  has  been  done  to  understand  human  perceptual  performance  under  a variety 
of  televised  conditions.  Add  to  this  the  fact  that  a long  past  history  of  laboratory  research 
and  applied  work  in  human  visual  perception  clearly  shows  that  stereopsis  is  a major  bene- 
fit for  tasks  requiring  spatial  localization  (Graham,  1965)  and  for  search  and  recognition 
tasks  where  the  visual  scene  must  be  interpreted  and  conclusions  obtained  (Merritt,  1 977a). 
Thus,  under  televised  conditions,  it  would  appear  that  performance  employing  stereo  view- 
ing would  be  significantly  better  than  under  mono  viewing,  principally  because  monocular 
cues  are  reduced  much  more  than  binocular  disparity,  the  fundamental  stereo  cue. 

What  is  the  basis  for  this  discrepancy?  The  possibilities  which  present  themselves  are 
these: 

(a)  Comparisons  of  stereo-mono  have  been  conducted  under  optimal  visibility 
conditions  where  mono-stereo  differences  might  be  at  a minimum. 

(b)  Because  of  unrealistic  visual-perceptual  task  situations  (i.e.,  clearly  defined  targets 
and  only  small  differences  in  the  plane  of  location),  only  small  stereo  advantages  would  be 
predicted. 
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(c)  Failure  to  control  learning  effects  which  occur  with  repeated  trials,  especially  with 
a static  task  scene,  would  ultimately  result  in  performance  relatively  independent  of  visual 
feedback. 

(d)  Poor  quality  stereo  displays  (compared  to  a mono  display)  and  the  interaction  of 
poor  quality  displays  with  (a),  (b)  and  (c)  could  well  result  in  inferior  stereo  performance.* 

In  light  ot  the  review  showing  that  stereo  displays  offer  no  consistent  performance 
advantage,  a careful  analysis  of  these  variables  associated  with  the  remote  performance  of 
manipulator  tasks  is  in  order.  We,  therefore,  address  these  factors  in  the  following  sections. 


2.  ANALYSIS  OF  PERFORMANCE  FACTORS 
A.  Visibility  Factors 

One  of  the  most  promising  hypotheses  to  be  tested  in  our  laboratory  is  the  idea  that 
stereo  TV  provides  a greater  advantage  over  mono  TV  as  visibility  conditions  deteriorate. 
This  has  been  true  in  the  interpretation  of  aerial  photography,  where  stereo  becomes  essen- 
tial when  the  imagery  is  degraded  by  haze,  low  luminance,  graininess,  and  so  on.  Similar 
reductions  in  normal  visibility  can  occur  as  a result  of  the  particulate  matter  in  the  sea- 
water. The  back  scattering  of  light  caused  by  particles  in  water  creates  a condition  of 
veiling  luminance  which  acts  to  reduce  the  contrast  between  the  object  of  interest  and  the 
scene  background.  Particles  also  create  visual  noise,  which  contributes  to  a reduction  in 
picture  resolution.  Finally,  the  movement  and  settling  of  particles  on  the  ocean  floor 
create  a cover  which  reduces  edge  and  contour  details  of  objects,  essentially  camouflaging 
them  from  view  (Merritt,  1977b).  We,  therefore,  have  proposed  a research  program  that 
will  test  operator  performance  on  a number  of  remote  manipulation  tasks  under  three 
levels  of  visibility;  (a)  clear  (the  best  image  quality  that  could  be  obtained  with  laboratory 
TV  systems),  (b)  medium  visibility  degradation  and  (c)  heavy  visibility  degradation. 

The  properties  of  closed-circuit  TV  systems  make  the  problem  of  specifying  visibility 
different  from  the  usual  optical  measurement  paradigm.  The  TV  operator  can  compensate 
for  a low  contrast  image  at  the  camera  faceplate  by  adjusting  gamma  or  gain  in  the  camera, 
or  by  adjusting  brightness  and  contrast  in  the  monitor.  This  permits  expansion  of  a light 
gray  and  a dark  gray  into  full  black  and  white  with  a contrast  transfer  better  than  \007r  at 
the  monitor  screen.  There  is  a limit  to  this  type  of  contrast  enhancement,  however,  and 
when  a given  camera/monitor  system  has  reached  its  limit,  a gray  and  washed-out  image 
may  be  the  best  an  operator  has  to  work  with.  Each  combination  of  TV  camera/monitor, 
lighting/geometry,  water  properties  will  show  a different  quality  of  image  on  the  monitor. 

*Note:  The  engineering  effort  and  optical  precision  necessary  to  maintain  a good  quality  stereo  image 
far  exceeds  that  required  for  a conventional  display.  Most  people  are  unable  to  recognize  poor  stereo 
or  even  reversed  stereo,  yet  they  are  quick  to  notice  deficiencies  in  conventional  TVs.  Julesz  (1977) 
recently  reported  that  in  a quality  control  task  where  the  operator  used  a binocular  microscope  to 
check  for  defective  integrated  circuits,  the  majority  of  the  operators  were  unable  to  maintain 
appropriate  stereo  calibration  of  the  microscope.  To  overcome  this  deficiency,  Julesz  placed 
random-dot  stereograms  on  test  plates  of  i.c.’s  so  that  the  operator  could  verify  the  binocular  alignment. 
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and  thus,  a given  screen  image  quality  cannot  be  linked  to  a particular  attenuation  coelfi- 
cient  or  scattering  coetticient.  What  counts  in  the  final  analysis  is  the  image  delivered  to 
the  operator,  and  this  is  the  image  which  we  propose  to  vary  in  quality  in  order  to  deter- 
mine the  effects  ol  visibility  on  performance.  The  image  on  the  TV  monitor  will  be  mea- 
sured m terms  of  luminance  of  the  imaged  reproduction  of  a known  target  placed  in  front 
o re  cameras.  Specifications  tor  setting  up  the  proper  brightness  and  contrast  on  the  TV 
monitor  will  ensure  that  all  subjects  receive  the  same  visual  input  for  each  of  the  conditions. 

The  most  promising  way  to  relate  levels  of  visibility  used  in  our  research  to  under- 
water optics  is  through  the  powerful  and  versatile  method  of  modulation  transfer  function 
(MTF)  analysis.  We  can  assume  that  the  MTF  of  the  overall  system  is  equivalent  to  that 
used  by  our  research  subjects  and  is  given  by  the  system’s  MTF  curve.  When  a remotely 
manned  system  in  the  real  world  encounters  water  conditions  which  interact  with  its  imag- 
ing system  to  produce  a particular  quality  of  imagery  on  the  monitor,  then  operator 
perlormance  can  be  predicted  by  the  MTF  of  the  monitor  image.  See  Funk,  Bryant  and 
Heckman  (1972)  tor  an  application  of  the  factors  affecting  the  monitor  characteristics. 
Backscatter  is  the  primary  degrading  factor  in  most  remote  system  operations  in  the  under- 
water environment,  and  is  even  more  exaggerated  in  those  systems  that  use  their  own 
illuniinant  sources.  It  is  tairly  easy  to  simulate  and  measure  backscatter,  since  the  MTF  of 
a veiling  luminance  is  simply  a straight  line  showing  equal  contrast  reduction  for  all  spatial 
frequencies,  regardless  ot  the  fineness  of  detail  or  the  size  of  a dark  area.  Mertens  (1970) 
provides  an  excellent  and  extensive  treatment  of  the  various  component  MTF’s  which 
cascade  to  produce  the  final  overall  system  MTF  in  the  underwater  imaging  situation. 
Since  backscatter  effects  cause  a veiling  luminance  which  reduces  contrast  of  both  large 
and  small  detail  equally,  it  can  be  controlled  by  means  of  the  camera/monitor  controls 
tor  gain  and  contrast.  These  can  be  adjusted  to  set  the  luminance  levels  of  a white  and 
black  square  calibrated  by  a luminance  meter. 


While  veiling  luminance  conditions  produce  the  major  visibility  degrading  conditions 
underwater,  other  factors  contribute  to  problems  associated  with  scene  “interpretability” 
or  perceptabihty”.  They  are;  (1 ) visual  noise  produced  by  large  particles  which  are  dis- 
turbed by  vehicle  thrusters  or  dislodged  from  objects  by  working  manipulators,  and  (2)  the 
way  marine  growth  and  siltation  may  camouflage  objects  which  are  being  searched  for 
visually  via  remote-viewing  systems. 


Because  of  these  considerations,  we  feel  it  is  important  to  go  beyond  simulation  and 
performance  testing  of  the  usual  optical  limits  of  contrast  transfer  (MTF  analysis)  under- 
water and  extend  our  research  to  consider  particle  noise  and  camouflage  effects  as  well. 


B.  Task  Factors 


A second  item  of  major  importance  in  this  research  is  the  type  of  task  that  the  opera- 
tor must  perform.  For  example,  a major  demand  on  the  operator  has  to  do  with  interpreting, 
or  making  sense  out  ot,  details  within  the  scene.  This  is  the  cognitive  process  which 
occurs  in  making  a judgment  about  the  identification  of  and  the  spatial  position  of  various 
components  within  the  scene.  (It  the  operator  is  viewing  the  scene  in  mono,  the  analysis  of 
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of  the  scene  can  be  time  consuming  and  demanding.)  A statement  often  heard  from  opera- 
tors of  remotely  manned  systems  when  using  conventional  TV  is  that,  “we  just  couldn't 
figure  out  what  we  were  looking  at”.  Few  of  the  studies  reviewed  considered  this  or  other 
levels  of  analysis  demanded  of  the  operator  in  the  sequence  of  tasks  he  has  to  carry  out. 
Nearly  all  situations  require  that  the  vehicle  operator  find  an  object  (or  working  area), 
recognize  the  orientation  of  the  object  with  respect  to  the  bottom  and  the  vehicle,  and 
then  position  the  vehicle  in  such  a way  that  operations  specific  to  the  use  of  the  manipula- 
tor can  be  carried  out.  Thus,  a general  analysis  might  take  the/ollowing  form: 
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1 . Where  is  the  target  area  located  in  space?  The  answer  provides  the  operator  with 
information  to  position  the  vehicle  or  camera  in  order  to  obtain  the  best  viewing  location 
possible  and  to  center  the  object  on  the  visual  display. 

2.  What  is  the  object?  This  is  basically  a discrimination  task.  The  operator  must  know 
the  characteristics  of  the  object  and  must  be  able  to  discriminate  it  from  all  other  objects 
in  the  environment.  The  objects  could  be  encrusted  with  natural  camouflage;  i.e.,  barnacles, 
plant  growth,  fish,  etc. 

3.  What  operations  must  be  performed  on  the  object?  The  answer  to  this  question  will 
determine  the  spatial  positioning  required  to  bring  to  bear  the  appropriate  manipulators 
and  specific  tools  at  the  operator’s  disposal. 


While  the  problems  encountered  by  the  vehicle  operator  in  accomplishing  the  first  two 
steps  of  the  sequence  described  above  are  not  uninteresting,  most  laboratory  research 
efforts  typically  commence  at  step  3.  We  have  followed  this  convention,  at  least  in  the 
initial  stages  of  our  research. 

Our  approach  was  to  conduct  an  analysis  of  the  task  factors  involved  at  step  3 in  order 
to  determine  the  fine-grain  perceptual-motor  requirements  imposed  on  the  operator  by  the 
combination  of  the  task  mission  and  visibility  conditions.  We  first  constructed  an  inventory 
of  operator  tasks  based  on  a review  of  the  literature,  evaluation  of  video  tapes  from  actual 
undersea  operations  (extracted  from  the  NOSC  video  tape  library,  including  ROWS,  WSP, 
and  the  integrated  LOSS  operations),  and  interviews  with  experienced  NOSC  remote 
vehicle  operators.  These  tasks  were  then  assigned  to  one  of  three  categories  based  on 
commonalities  of  their  major  perceptual-motor  requirements.  Finally,  we  selected  one  task 
from  each  of  the  three  categories  which,  based  on  conclusions  from  our  work,  was  most 
representative  of  a class  of  tasks.  The  task  categories,  the  three  selected  tasks,  and  selection 
rationale  are  presented  below. 


Category  1.  Tasks  in  this  category  include  drilling,  stud  gun  firing,  tapping,  threading, 
removing  bolts,  and  connecting-disconnecting  couplers.  These  tasks  have  several  common 
elements;  they  all  require  critical  alignment  of  the  object  of  interest  and  the  end-effector. 
They  appear  to  require  little  or  no  change  in  the  sagital  plane,  and  several  of  them  require 
rotational  positioning  superimposed  on  the  attitude  alignment  factor.  The  major  under- 
lying characteristic  of  these  tasks  appears  to  be  represented  by  a task  developed  by  Hill  and 
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his  research  group  at  SRI.  (See  Figure  1.)  This  task  has  been  termed  "Peg-in-hole”,  and 
variations  can  be  employed  which  systematically  increase  the  degree  of  constraint  for  atti- 
tude alignment,  and  rotational  and  translational  movements  of  a fitting  task.  (See  Hill, 
1977,  for  further  details.) 

Category  2.  Cutting  a cable,  attaching  a J-hook,  and  attaching  a cable  clamp  are 
examples  of  Category  2 tasks.  These  tasks  all  require  relatively  large  movements  of  the 
end-effector  or  tool  in  the  sagital  plane  in  order  to  acquire  the  cable  or  attachment  point. 
Implicit  in  the  perfomance  of  these  tasks  is  the  recognition  of  the  appropriate  position 
point  or  cable,  the  proper  alignment  of  the  tool  with  respect  to  other  objects,  and  the 
potential  hazards  associated  with  contacting  the  manipulator  with  adjacent  or  otherwise 
impeding  cables,  lines,  or  objects.  This  task  involves  the  perception  of  changes  in  the  sagital 
depth  plane,  as  well  as  in  the  horizontal  position.  Selecting  appropriate  routes  through  a 
maze  of  potentially  interfering  components  may  make  this  task  extremely  difficult  or 
impossible  without  the  depth  information  conveyed  by  stereo.  We  are  in  the  process  of 
designing  an  appropriate  task  which  includes  these  essmitial  features. 

Category  3.  Line  feeding,  simple  attachment,  and  manipulator  positioning  for  sample 
retrieval  are  representative  of  Category  3 tasks.  These  tasks  require  the  perception  and 
utilization  of  information  regarding  changes  in  the  sagital  plane,  as  well  as  in  the  horizontal 
plane.  While  positioning  requires  attitude  and  depth  alignment,  the  relat'ze  seriousness  of 
contlicting  visual  components  is  not  as  severe  as  in  Category  2.  Therefore,  the  available 
monocular  cues  may  be  used  to  greater  advantage  without  the  potentially  hazardous  effects 
a.ssociated  with  the  above  described  task.  We  have  constructed  a model  which  is  composed 
of  multiple  loops  arranged  in  a complex  configuration.  Pilot  data  from  this  apparatus  pro- 
duce satisfactorily  reliable  data  for  response  time  and  errors. 


8 


Visibility-Task  Interaction.  While  there  are  many  factors  which  contribute  to  reduced 
visibility  in  the  underwater  world,  their  effects  on  performance  are  dependent  on  the  type 
of  visual  information  that  is  necessary  to  perform  the  specific  task.  Thus,  poor  visibility 
might  ditferentially  lower  performance  on  a task  that  requires  spatial  judgment  and  a 
major  difference  between  mono  and  stereo  might  be  evidenced.  However,  on  tasks  involv- 
ing scene  interpretation,  mono  TV  performance  is  very  poor,  even  with  good  visibility. 
Thus,  conditions  which  degrade  visibility  may  well  not  have  a measurable  effect  on  mono 
performance  because  of  a “floor”  effect.  In  other  words,  if  things  are  all  that  bad,  they 
can’t  get  worse.  Performance  would  be  expected  to  improve  with  the  use  of  stereo  in  both 
spatial  localization  and  interpretation  tasks,  especially  when  employing  a stereo  display 
system  which  does  not  degrade  resolution  and  might  even  enhance  it  due  to  an  improved 
signal-to-noise  ratio  in  the  fused  stereo  percept.  In  these  cases,  stereo  might  be  expected  to 
provide  an  excellent  advantage,  since  in  stereo,  the  noise  is  uncorrelated  while  the  target 
objects  are  correlated;  noise  is  not  able  to  significantly  disrupt  the  binocular  depth  per- 
cepts. The  random  dot  stereo  patterns  by  Julesz  (1971)  models  the  randomly  distributed 
marine  growth  and  siltation  which  camouflage  objects  on  the  ocean  bottom.  Just  as  the 
random  elements  in  Julesz’s  stereo  patterns  are  discarded  by  the  brain,  the  randomly  dis- 
tributed marine  growth  elements  will  not  hide  objects  from  observers  employing  stereo 
displays.  Since  the  observer  need  not  recognize  an  object  from  monocular  cues  before  seeing 
its  shape,  the  stereo  registration  will  still  convey  depth,  curvature  and  shape  information. 


C.  Learning  Factors 

Learning  is  a pervasive  phenomenon  which  occurs  under  both  real  world  and  labora- 
tory conditions.  Generally,  the  performance  improvement  which  results  from  repeated 
trials  is  attributed  to  learning.  A careful  analysis  is  required  to  describe  the  particular 
dimensions  of  the  learning  components  of  specific  tasks.  However,  a general  summary  of 
the  outcome  of  repetition  would  conclude  that  perceptual-motor  links  are  established, 
motor-skills  which  are  basic  to  the  controller-manipulator  operation  are  learned,  and  often 
a discrete  spatial  mapping  of  the  visual  task  area  becomes  associated  with  the  operators’ 
actions  in  controlling  the  manipulator. 

In  the  real  world,  many  tasks  require  repetition  or  successive  approximation  simply 
because  trial  and  error  is  the  final  irreducible  strategy  employed  by  the  operator  under 
most  conditions.  One  must  recognize  that  the  opportunity  for  trial  and  error  learning  is 
costly  (either  in  operating  time,  or  in  increasingly  risky  or  unsafe  operating  conditions). 
Thus,  while  the  learning  effect  is  not  unimportant  in  the  real  world,  it  is  in  the  laboratory 
that  special  concern  for  this  phenomenon  is  developed.  This  concern  is  largely  due  to  the 
frequent  use  of  repeated  trial  designs  which  contribute  greatly  to  the  reliability  of  the 
subject  s data.  These  learning  effects  are  usually  controlled  using  appropriate  experimental 
design  considerations  to  distribute  the  effects  across  the  major  factors  of  the  experiments. 
It  is  our  contention,  however,  that  both  theoretical  and  practical  considerations  require  a 
closer  look  at  learning  effects  and  their  interaction  with  task  and  visibility  parameters.  As 
an  example,  Pesch  (1967)  reported  differences  in  mono  and  stereo  performances  which 
“washed  out”  with  repeated  testing.  In  order  to  develop  an  appreciation  for  the  way  in 
which  learning  effects  become  important  in  research  of  this  kind,  the  results  of  Pesch’s 
study  will  be  examined  with  particular  attention  focused  on  possible  learning  factors 
influencing  performance. 
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F’escli  employod  two  tasks  which  differed  with  respect  to  the  levels  of  perceptual- 
motor  feedback  required  for  performance.  One  task  was  a dynamic  cable  handling  task 
which  required  the  operator  to  make  manipulator  contact  with  the  environment,  while  the 
other  was  a simple  spatial  positioning  task  apparently  requiring  little  visual  feedback  to 
monitor  performance.  Performance  for  these  tasks  was  measured  under  augmented  and 
non-augmented  visual  conditions.  The  augmented  feedback  consisted  of  a visual  background 
which  included  objects  of  known  size,  highlights  and  shadows.  These  cues  are  involved  in 
making  size  and  distance  Judgments.  In  the  non-augmented  condition,  a homogeneous 
bottom,  with  uniform  illumination  levels,  produced  no  shadows;  only  the  test  object  was 
visible  in  the  operator’s  view. 

The  results  indicate  that  in  the  cable  handling  task,  stereo  performance  was  consistently 
better  than  mono  during  day  one,  regardless  of  the  visibility  conditions.  Performance  on 
the  spatial  positioning  task  showed  no  mono/stereo  differences. 

During  the  second  day,  stereo  performance  remained  the  same  on  the  cable  handling 
task  (suggesting  that  a “ceiling  effect”  was  operating  to  limit  improvement),  while  mono 
performance  improved  under  both  (augmented  and  non-augmented)  visibility  conditions. 
The  improvements  in  performance  were  not  equivalent.  Under  augmented  conditions, 
mono  performance  equalled  that  obtained  with  stereo.  However,  under  non-augmented 
visibility,  mono  performance  still  remained  inferior  to  stereo  performance.  This  result  is 
especially  evident  in  the  error  score  data,  and  is  possibly  related  to  the  differential  (mono- 
stereo)  visual  feedback  conditions  which  facilitate  learning  the  task. 

The  results  of  this  study  point  immediately  to  the  importance  of  understanding  how 
learning  can  have  differential  effects  on  performance  resulting  from  both  task  and  visibility 
factors.  They  additionally  indicate  at  least  one  area  of  interaction  between  these  variables. 


3.  SUMMARY  OF  FIRST  YEAR’S  WORK 

During  FY  1977  a research  project  was  initiated  under  ONR  sponsorship.  The  primary 
purpose  was  to  explore  the  utility  of  stereoscopic  television  as  a visual  aid  in  remotely 
manned  undersea  vehicle  operations. 

The  first  year  effort  was  directed  toward  establishing  baseline  mono  and  stereo 
performance  using  tasks  of  perceptual  judgment.  An  additional  goal  was  the  selection  and 
development  of  tasks  which  involve  hand-eye  coordination  and  the  perception  of  object 
location  in  space  (i.e.,  tasks  which  require  the  operator  to  employ  continuous  visual  feed- 
back to  position  a manipulator  appropriately). 

Three  studies  were  conducted  to  evaluate  operator  performance  with  conventional  and 
stereo  display  systems  (Pepper,  Cole  and  Smith,  1977).  The  first  two  studies  involved  per- 
ceptual judgment  tasks  (a  modified  Howard-Dolnian  depth  discrimination  test,  and  Julcsz’s 
test  of  stereopsis  deficiency);  the  third  study  employed  a perceptual  motor  task  requiring 
end-effector  positioning  and  closure.  This  third  task  was  designed  to  approximate  compo- 
nents of  an  undersea  object  recovery  task. 
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The  results  of  studies  one  and  two  are  presented  in  Table  1 . Stereo  acuity  performance 
as  measured  by  the  Howard  Dolman  apparatus  indicates  that  both  stereo  TV  display 
systems  provide  adequate  information  to  enhance  performance  over  tliat  given  with  a con- 
ventional mono  display.  Study  two  indicates  that  stereopsis  thresholds  obtained  with  Julesz 
patterns  are  equal  using  the  Fresnel  system  and  the  Field  Sequential  system,  and  are  no 
ditlerent  from  thresholds  obtained  while  viewing  Julesz  patterns  directly  without  viewer 
aids. 


Table  1.  Comparison  ot  Stereo  Acuity  Performance  on  Two  Display  Systems 
as  Measured  by  Random  Dot  (7oBinocularity)  and  Howard  Dolman 
(Angular/Disparity)  Tasks 


%Binocularity 
(means  of  4 subjects) 


Angular  Disparity 
in  min/sec  of  arc 
(means  of  5 subject) 


stereo 


Direct 

Fresnel 

Field  Sequential 


49.09 

429.35 

426.26 


1 1.97 
220.57 
191.32 


In  study  three,  conventional  TV  was  compared  with  the  Field  Sequential  stereo  system 
in  displaying  a perceptual  motor  task.  Eight  males  and  one  female  were  used  in  this  study. 
Four  of  the  males  had  prior  experience  using  manipulators  with  non-stereo  TVs.  The  female 
subject  was  employed  to  assess  the  overall  effects  of  experience  with  repeated  testing.  A 
task  board  was  fabricated  to  provide  1 1 simulated  link  attachment  points  over  an  area  of 
4,648.2  sq.  cm  (61.0  x 76.2  cm).  Figure  2 shows  the  various  spatial  orientations  and  eleva- 
tions of  the  links  from  0 to  30.5  cm  above  the  testing  platform.  The  task  board  was  painted 
a medium  gray,  with  splatter  paint  and  random  dark  splotches  added  to  simulate  visual 
camouflage  due  to  marine  growth  and  sediment. 


■ 


Figure  2.  Link  taskboard. 


A CRL-Model-L  master-slave  manipulator  was  employed  to  achieve  the  appropriate 
end-etfector  positioning  and  closure  response.  Subjects  were  instructed  to  position  the 
manipulator  as  quickly  as  possible  and  to  grasp  the  link  (previously  indicated  by  the  experi- 
menter) with  a minimum  of  errors.  To  reduce  the  opportunity  for  spatial  learning  of  the 
link  positions,  the  task  board  was  systematically  rotated  to  four  different  positions  during 
the  course  ot  each  100-trial  sequence.  The  time  elapsed  between  “start”  signal  and  grasping 
ol  the  correct  link  constituted  the  time  scores.  Errors  were  counted  as  any  contact  with 
links  (excel  t grasping  ot  the  correct  links),  or  other  part  ol  the  board  or  incorrect  closures. 
Mean  time  . 'id  error  scores  are  shown  in  Table  2.  Analysis  of  variance  results  indicate  that 
the  use  ot  a stereo  display  signiticantly  reduces  both  response  time  and  errors  on  this 
perceptual-motor  task.  There  was  no  signiticant  ditference  between  the  experienced  and 
(he  inexperienced  subjects. 


Table  2.  Mean  Task  Completion  Times  and  Mean 
Errors  for  Mono  and  Stereo  TV  Displays 


Mono 

Stereo 

Errors 

Time 

Errors 

Time 

Experienced 

N=4 

1.75 

7.20 

1.23 

4.31 

Inexperienced 

N=5 

2.05 

7.42 

0.65 

4.63 

The  naive  subject  who  conducted  repeated  tests  over  five  days  initially  showed  the 
poorest  pertormance  compared  to  all  other  subjects,  but  by  day  5 was  performing  as  well 
as  the  experienced  operators.  Figure  3 presents  the  five-day  results.  It  can  be  seen  that  a 
learning  effect  does  occur  despite  the  spatial  rotation  of  the  task  board  and  the  counter- 
balancing procedures  employed.  This  effect  is  probably  due  to  the  subject  acquiring  motor- 


Figure  3.  Mono  vs  stereo  link  task  time  scores  (A), 
and  error  scores  (B)  with  repeated  testing. 
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skills  with  the  manipulator  and  to  learning  the  four  positions  of  the  board.  Note  that  the 
functions  described  in  Figure  3 evidenced  no  interaction  effect;  that  is,  improvement  in 
performance  occurs  equally  under  both  mono  and  stereo  conditions. 

From  the  first  year’s  research  activities,  it  seems  clear  that  there  is  a loss  of  depth 
information  when  a three-dimensional  arrangement,  such  as  the  Howard-Dolman  apparatus, 
is  televised  (either  in  mono  or  stereo),  but  not  when  random  dot  patterns  displayed  on  two 
dimensional  cards  are  televised.  Although  there  was  no  difference  in  performance  on  either 
task  due  to  TV  systems  used,  the  systems  differ  in  a number  of  ways.  The  Fresnel  display 
appeared  to  give  higher  image  resolution,  thus  a more  sharply  focused  image  which  subjec- 
tively seemed  to  be  less  prone  to  a deterioration  in  picture  quality  than  the  Field  Sequential 
display.  This  deterioration  was  due  primarily  to  the  result  of  a reduction  in  brightness  and 
the  nicker  introduced  by  the  glasses  of  the  Field  Sequential  display.  The  fact  that  the 
Fresnel  display  required  a rigid,  fixed  posture  appeared  to  result  in  greater  fatigue  than  the 
free  head  positioning  permitted  by  the  Field  Sequential  display.  Although  it  was  not  a fac- 
tor in  these  two  studies,  the  restrictions  of  head  position  would  have  an  additional  disad- 
vantage in  perceptual-motor  tasks  where  large  arm  or  body  movements  are  required.  It  was 
mainly  for  this  latter  reason  that  the  Field  Sequential  display  was  selected  for  use  with  the 
perceptual-motor  task  in  study  three. 

4.  CURRENT  EFFORTS 

Our  current  efforts  are  concentrated  on  developing  reliable  controls  of  the  visibility 
and  task  factors  that  were  described  in  detail  above  and  in  adapting  experimental  designs 
and  statistical  analyses  to  permit  assessment  of  learning  effects  and  their  interactions  with 
visibility  and  task  factors.  The  near  term  goal  is  to  compare  performance  using  mono  vs 
stereo  displays  under  high,  medium  and  low  visibility  conditions  for  the  three  types  of 
tasks  previously  described.  The  hypothesis  underlying  this  line  of  inquiry  is  that  conditions 
which  degrade  visibility  will  contribute  differentially  to  mono  and  stereo  perceptual 
performance.  Figure  4 presents  this  prediction  graphically.  Regardless  of  how  well  an 


Figure  4.  Predicted  mono  vs  stereo  performance 
under  three  levels  of  visibility. 
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operator  might  perform  under  clear  conditions  with  a mono  display,  his  performance  will 
fall  off  at  a greater  rate  than  with  a comparable  stereo  display  as  visibility  is  degraded.  The 
basis  for  this  prediction  is  the  fact  that  mono  cues  of  relative  height  and  size,  linear  pers- 
pective. and  light  and  shadow  will  be  lost  before  binocular  disparity  (the  cue  which 
underlies  stereo)  is  lost. 

In  a pilot  experiment  to  test  the  above  hypothesis,  two  subjects  were  run  under 
conditions  of  high  and  low  visibility  using  mono  and  stereo  TV  to  position  a manipulator. 
Fifteen  trials  in  each  condition  resulted  in  the  performance  depicted  in  Figure  5.  The  data 
points  are  the  arithmetic  mean  values  for  each  of  the  four  treatment  conditions.  These 
preliminary  data  suggest  that  there  is  an  increase  in  stereo  advantage  as  visibility  is  degraded. 
This  result  is  consistent  with  the  notion  that  mono  and  stereo  performance  is  disrupted 
differentially  by  degraded  visibility.  Further  exploration  of  this  phenomenon  is  underway 
using  a more  complex  version  of  the  link  task,  a larger  subject  sample,  and  a more  rigorous 
visibility  simulation  procedure.  Since  completing  the  pilot  project,  we  have  developed  a 
method  which  permits  a trial-to-trial  change  of  visibility  levels  (contrast  ratios)  at  the  TV 
monitor  while  holding  overall  luminance  levels  constant. 


Figure  5.  Operator  performance  on  peg-in4iole  task. 


5.  FUTURE  DIRECTIONS  AND  IMPLICATIONS 

In  our  first  year’s  work,  several  ideas  were  identified  for  consideration  and  future 
investigation.  The  first  of  these  was  the  fact  that  performance  may  be  enhanced  with 
increased  resolution  on  the  CRT  display.  This  factor  apparently  was  involved  in  the  studies 
comparing  the  Fresnel  and  Field  Sequential  displays.  That  is,  our  subjects  reported  that  the 
picture  quality  was  significantly  better  with  the  Fresnel  than  the  Field  Sequential  display. 
Grant  et  al  (1973)  indicate  that  a primary  advantage  of  the  Fresnel  stereo  display  over  all 
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retically  limited  to  6.35  cm  (the  interpuSlIarv  Hkt^  operative  is  theo- 

size,  a pupil-spreading  technique  employine  a lenti  ° regardless  of  display 

head  motion  on  the  order  of  30  cm  This  aHv  appears  to  enable  vertical 

experimental  work  with  thts  sy^em  T n,  "o 

system  systematically.  reported.  We  plan  to  obtain  and  evaluate  this 

Implications 

1.  Reduced  search  time  for  location  of  taract  Thjc  • • ..  . 

objects  or  those  obscured  by  other  objects  or  sLiment  “"familiar 

.n  .zrdSr;:::  - “ 

effaces  IXjZly  Z'a ntX'’sZru'"H  '''  »' 

or  obscured  targets,  and  (c)  task  conditions^wh*'  h ''•sibility  conditions,  (b)  unfamiliar 
forward  dieecio^.  i hiZe^t  "’7"","'’ 

tion  tasks  where  trial  and  error  is  unavailable  to  nroviHra  • * feedback  and  single  opera- 
learning  experiences.  ^ ^ inimediate  perceptual  motor 
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